84 research outputs found
Generalized Max Pooling
State-of-the-art patch-based image representations involve a pooling
operation that aggregates statistics computed from local descriptors. Standard
pooling operations include sum- and max-pooling. Sum-pooling lacks
discriminability because the resulting representation is strongly influenced by
frequent yet often uninformative descriptors, but only weakly influenced by
rare yet potentially highly-informative ones. Max-pooling equalizes the
influence of frequent and rare descriptors but is only applicable to
representations that rely on count statistics, such as the bag-of-visual-words
(BOV) and its soft- and sparse-coding extensions. We propose a novel pooling
mechanism that achieves the same effect as max-pooling but is applicable beyond
the BOV and especially to the state-of-the-art Fisher Vector -- hence the name
Generalized Max Pooling (GMP). It involves equalizing the similarity between
each patch and the pooled representation, which is shown to be equivalent to
re-weighting the per-patch statistics. We show on five public image
classification benchmarks that the proposed GMP can lead to significant
performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern
Recognition (2014
Deep Fishing: Gradient Features from Deep Nets
Convolutional Networks (ConvNets) have recently improved image recognition
performance thanks to end-to-end learning of deep feed-forward models from raw
pixels. Deep learning is a marked departure from the previous state of the art,
the Fisher Vector (FV), which relied on gradient-based encoding of local
hand-crafted features. In this paper, we discuss a novel connection between
these two approaches. First, we show that one can derive gradient
representations from ConvNets in a similar fashion to the FV. Second, we show
that this gradient representation actually corresponds to a structured matrix
that allows for efficient similarity computation. We experimentally study the
benefits of transferring this representation over the outputs of ConvNet
layers, and find consistent improvements on the Pascal VOC 2007 and 2012
datasets.Comment: To appear at BMVC 201
Handwritten word-image retrieval with synthesized typed queries
We propose a new method for handwritten word-spotting which does not require prior training or gathering examples for querying. More precisely, a model is trained “on the fly ” with images rendered from the searched words in one or multiple computer fonts. To reduce the mismatch between the typed-text prototypes and the candidate handwritten images, we make use of: (i) local gradient histogram (LGH) features, which were shown to model word shapes robustly, and (ii) semi-continuous hidden Markov models (SC-HMM), in which the typed-text models are constrained to a “vocabulary ” of handwritten shapes, thus learning a link between both types of data. Experiments show that the proposed method is effective in retrieving handwritten words, and the comparison to alternative methods reveals that the contribution of both the LGH features and the SC-HMM is crucial. To the best of the authors ’ knowledge, this is the first work to address this issue in a non-trivial manner
RODRIGUEZ-SERRANO, PERRONNIN: LABEL EMBEDDING FOR TEXT RECOGNITION 1 Label embedding for text recognition
The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields (CRF). This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM (SSVM) framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents the following advantages: it does not require costly pre- or post-processing operations, it allows for the recognition of never-seen-before words and the recognition process is efficient. Experiments are performed on two challenging datasets (one of license plates and one of scene text) and show that the proposed method is competitive with standard bottom-up approaches to text recognition. 1 Introduction and related wor
Selection itérative de transformations pour la classification d'images
National audienceEn classification d'images, une stratégie efficace pour apprendre un classifieur invariant à certaines transformations consiste à augmenter l'échantillon d'apprentissage par le même ensemble d'exemples mais auxquels les transformations ont été appliquées. Néanmoins, lorsque l'ensemble des transformations possibles est grand, il peut s'avérer difficile de sélectionner un petit nombre de transformations pertinentes parmi elles tout en conservant une taille d'échantillon d'apprentissage raisonnable. optimal. En effet, toutes les transformations n'apportent pas le même impact sur la performance ; certains peuvent même dégrader la performance. Nous proposons un algorithme de sélection automatique de transformations : à chaque itération, la transformation qui donne le plus grand gain en performance est sélectionnée. Nous évaluons notre approche sur les images de la compétition ImageNet 2010 et améliorons la performance en top-5 accuracy de 70.1% à 74.9%
An introduction to biometrics audio and video-based person authentification
Biometrics, which refers to identifying an individual based on his/her physical or behavioral characteristics, has gained
in popularity among researchers in signal processing during recent years. It has also focused the attention of
medias since the tragic events of September 11th, 2001. We first introduce the notion of biometrics. Then, we describe
the architecture of biometric systems and the metrics used to evaluate their performances. We briefly discuss
the most common biometrics and the different ways to combine them to obtain multimodal systems. Finally, we
present applications of biometrics.La biométrie, qui consiste à identifier un individu à partir de ses caractéristiques physiques ou comportementales, connaît depuis quelques années un renouveau spectaculaire dans la communauté du traitement du signal. Elle a aussi reçu une attention accrue de la part des médias depuis les tragiques événements du 11 septembre 2001. Dans cet article nous introduisons tout d'abord la notion de biométrie. Nous décrivons l'architecture d'un système biométrique ainsi que les métriques utilisées pour évaluer leur performance. Nous donnons un bref aperçu des technologies biométriques les plus courantes et des moyens de les fusionner pour obtenir des systèmes multimodaux. Nous présentons enfin les applications possibles de la biométrie
Selection itérative de transformations pour la classification d'images
National audienceEn classification d'images, une stratégie efficace pour apprendre un classifieur invariant à certaines transformations consiste à augmenter l'échantillon d'apprentissage par le même ensemble d'exemples mais auxquels les transformations ont été appliquées. Néanmoins, lorsque l'ensemble des transformations possibles est grand, il peut s'avérer difficile de sélectionner un petit nombre de transformations pertinentes parmi elles tout en conservant une taille d'échantillon d'apprentissage raisonnable. optimal. En effet, toutes les transformations n'apportent pas le même impact sur la performance ; certains peuvent même dégrader la performance. Nous proposons un algorithme de sélection automatique de transformations : à chaque itération, la transformation qui donne le plus grand gain en performance est sélectionnée. Nous évaluons notre approche sur les images de la compétition ImageNet 2010 et améliorons la performance en top-5 accuracy de 70.1% à 74.9%
- …